Multi-tenant data service in distributed file systems for big data analysis

ABSTRACT

Configuration of a multi-tenant distributed file system on a node. Various tenants and tenant clusters are correlated to a distributed file system, and the distributed file system communicates with various tenants through a connector service. The entire distributed file system exists on a physical node.

BACKGROUND

The present invention relates generally to the field of storage accessand control, and more particularly to memory configuring.

In a converged system, virtualization provides elasticity of computingresources, storage space, and/or application mobility. A convergedinfrastructure, groups information technology components into a softwarepackage. A virtualization container is a software package that includesa file system to install software on a server in a reliable fashion. Anexample of a virtualization container is Docker. Some virtualizationcontainers include software library frameworks. A software libraryframework allows for distributed processing of large data sets using aprogramming model. One example of such a software library framework isHadoop. A portable operating system interface maintains compatibilitybetween various operating systems. A portable operating system interfacedefines a set of application programming interfaces. An example of aportable operating system interface standard is POSIX.

Big data analytics allows the analysis of technology in despite theexponential growth and availability of data, including both structureddata and unstructured data. Big data analytics, has evolved in twodirections: (i) relation database-based massively parallel processing;and (ii) software library framework-based analysis.

SUMMARY

According to an aspect of the present invention, there is a method,computer program product, and/or system that performs the followingoperations (not necessarily in the following order): (i) determining afirst directory corresponding to a first tenant identifier in a set oftenant identifiers, wherein: (a) the first directory is organized usinga first interface standard, and (b) the first tenant identifiercorresponds to a first tenant of the first directory; (ii) assigning aconnector service to the first directory and the first tenantidentifier; (iii) determining a second directory corresponding to theconnector service, wherein: (a) the second directory is organized usinga second interface standard, (b) a first node contains a first set offiles on the second directory, and (c) the first set of filescorresponds to the first tenant; (iv) processing a first read/writerequest in a set of read/write requests using the connector service andthe first node, wherein the first read/write request is from the firsttenant; and (v) generating a first result to the first read/writerequest. At least processing the first read/write request using theconnector service and the first node is performed by computer softwarerunning on computer hardware.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram view of a first embodiment of a systemaccording to the present invention;

FIG. 2 is a flowchart showing a first embodiment method performed, atleast in part, by the first embodiment system;

FIG. 3 is a block diagram view of a machine logic (e.g., software)portion of the first embodiment system;

FIG. 4 is a flowchart showing a second embodiment method performed by asecond embodiment of a system according to the present invention;

FIG. 5 is a block diagram view of the second embodiment of the system;

FIG. 6 are lookup tables generated by a third embodiment of the systemaccording to the present invention; and

FIG. 7 is a flowchart showing a third embodiment method performed by afourth embodiment of a system according to the present invention.

DETAILED DESCRIPTION

Configuration of a multi-tenant distributed file system on a node.Various tenants and tenant clusters are correlated to a distributed filesystems, and the distributed file system communicates with varioustenants through a connector service. The entire distributed file systemexists on a physical node. This Detailed Description section is dividedinto the following sub-sections: (i) Hardware and Software Environment;(ii) Example Embodiment; (iii) Further Comments and/or Embodiments; and(iv) Definitions.

I. Hardware and Software Environment

The present invention may be a system, a method, and/or a computerprogram product at any possible technical detail level of integration.The computer program product may include a computer readable storagemedium (or media) having computer readable program instructions thereonfor causing a processor to carry out aspects of the present invention.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, configuration data for integrated circuitry, oreither source code or object code written in any combination of one ormore programming languages, including an object oriented programminglanguage such as Smalltalk, C++, or the like, and procedural programminglanguages, such as the “C” programming language or similar programminglanguages. The computer readable program instructions may executeentirely on the user's computer, partly on the user's computer, as astand-alone software package, partly on the user's computer and partlyon a remote computer or entirely on the remote computer or server. Inthe latter scenario, the remote computer may be connected to the user'scomputer through any type of network, including a local area network(LAN) or a wide area network (WAN), or the connection may be made to anexternal computer (for example, through the Internet using an InternetService Provider). In some embodiments, electronic circuitry including,for example, programmable logic circuitry, field-programmable gatearrays (FPGA), or programmable logic arrays (PLA) may execute thecomputer readable program instructions by utilizing state information ofthe computer readable program instructions to personalize the electroniccircuitry, in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general-purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the blocks may occur out of theorder noted in the Figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

An embodiment of a possible hardware and software environment forsoftware and/or methods according to the present invention will now bedescribed in detail with reference to the Figures. FIG. 1 is afunctional block diagram illustrating various portions of networkedcomputers system 100, including: multi-tenant configuration sub-system102; user sub-system 104; virtual container sub-system 106; virtualcontainer sub-system 108; connector service 112; and communicationnetwork 114. Multi-tenant configuration sub-system 102 contains:multi-tenant configuration computer 200; display device 212; andexternal devices 214. Multi-tenant configuration computer 200 contains:communication unit 202; processor set 204; input/output (I/O) interfaceset 206; memory device 208; and persistent storage device 210. Memorydevice 208 contains: random access memory (RAM) devices 216; and cachememory device 218. Persistent storage device 210 contains: multi-tenantconfiguration program 300. Virtual container sub-system 108 includes:software library framework 110.

Multi-tenant configuration sub-system 102 is, in many respects,representative of the various computer sub-systems in the presentinvention. Accordingly, several portions of multi-tenant configurationsub-system 102 will now be discussed in the following paragraphs.

Multi-tenant configuration sub-system 102 may be a laptop computer, atablet computer, a netbook computer, a personal computer (PC), a desktopcomputer, a personal digital assistant (PDA), a smart phone, or anyprogrammable electronic device capable of communicating with clientsub-systems via communication network 114. Multi-tenant configurationprogram 300 is a collection of machine readable instructions and/or datathat is used to create, manage, and control certain software functionsthat will be discussed in detail, below, in the Example Embodimentsub-section of this Detailed Description section.

Multi-tenant configuration sub-system 102 is capable of communicatingwith other computer sub-systems via communication network 114.Communication network 114 can be, for example, a local area network(LAN), a wide area network (WAN) such as the Internet, or a combinationof the two, and can include wired, wireless, or fiber optic connections.In general, communication network 114 can be any combination ofconnections and protocols that will support communications betweenmulti-tenant configuration sub-system 102 and client sub-systems.

Multi-tenant configuration sub-system 102 is shown as a block diagramwith many double arrows. These double arrows (no separate referencenumerals) represent a communications fabric, which providescommunications between various components of multi-tenant configurationsub-system 102. This communications fabric can be implemented with anyarchitecture designed for passing data and/or control informationbetween processors (such as microprocessors, communications processors,and/or network processors, etc.), system memory, peripheral devices, andany other hardware components within a system. For example, thecommunications fabric can be implemented, at least in part, with one ormore buses.

Memory device 208 and persistent storage device 210 are computerreadable storage media. In general, memory device 208 can include anysuitable volatile or non-volatile computer readable storage media. It isfurther noted that, now and/or in the near future: (i) external devices214 may be able to supply some, or all, memory for multi-tenantconfiguration sub-system 102; and/or (ii) devices external tomulti-tenant configuration sub-system 102 may be able to provide memoryfor multi-tenant configuration sub-system 102.

Multi-tenant configuration program 300 is stored in persistent storagedevice 210 for access and/or execution by one or more processors ofprocessor set 204, usually through memory device 208. Persistent storagedevice 210: (i) is at least more persistent than a signal in transit;(ii) stores the program (including its soft logic and/or data) on atangible medium (such as magnetic or optical domains); and (iii) issubstantially less persistent than permanent storage. Alternatively,data storage may be more persistent and/or permanent than the type ofstorage provided by persistent storage device 210.

Multi-tenant configuration program 300 may include both substantive data(that is, the type of data stored in a database) and/or machine readableand performable instructions. In this particular embodiment (i.e., FIG.1), persistent storage device 210 includes a magnetic hard disk drive.To name some possible variations, persistent storage device 210 mayinclude a solid-state hard drive, a semiconductor storage device, aread-only memory (ROM), an erasable programmable read-only memory(EPROM), a flash memory, or any other computer readable storage mediathat is capable of storing program instructions or digital information.

The media used by persistent storage device 210 may also be removable.For example, a removable hard drive may be used for persistent storagedevice 210. Other examples include optical and magnetic disks, thumbdrives, and smart cards that are inserted into a drive for transfer ontoanother computer readable storage medium that is also part of persistentstorage device 210.

Communication unit 202, in these examples, provides for communicationswith other data processing systems or devices external to multi-tenantconfiguration sub-system 102. In these examples, communication unit 202includes one or more network interface cards. Communication unit 202 mayprovide communications through the use of either or both physical andwireless communications links. Any software modules discussed herein maybe downloaded to a persistent storage device (such as persistent storagedevice 210) through a communications unit (such as communication unit202).

I/O interface set 206 allows for input and output of data with otherdevices that may be connected locally in data communication withmulti-tenant configuration computer 200. For example, I/O interface set206 provides a connection to external devices 214. External devices 214will typically include devices, such as a keyboard, a keypad, a touchscreen, and/or some other suitable input device. External devices 214can also include portable computer readable storage media, such as, forexample, thumb drives, portable optical or magnetic disks, and memorycards. Software and data used to practice embodiments of the presentinvention (e.g., multi-tenant configuration program 300) can be storedon such portable computer readable storage media. In these embodiments,the relevant software may (or may not) be loaded, in whole or in part,onto persistent storage device 210 via I/O interface set 206. I/Ointerface set 206 also connects in data communication with displaydevice 212.

Display device 212 provides a mechanism to display data to a user andmay be, for example, a computer monitor or a smart phone display screen.

The programs described herein are identified based upon the applicationfor which they are implemented in a specific embodiment of theinvention. However, it should be appreciated that any particular programnomenclature herein is used merely for convenience, and thus, theinvention should not be limited to use solely in any specificapplication identified and/or implied by such nomenclature.

The descriptions of the various embodiments of the present inventionhave been presented for purposes of illustration, but are not intendedto be exhaustive or limited to the embodiments disclosed. Manymodifications and variations will be apparent to those of ordinary skillin the art without departing from the scope and spirit of the describedembodiments. The terminology used herein was chosen to best explain theprinciples of the embodiments, the practical application or technicalimprovement over technologies found in the marketplace, or to enableothers of ordinary skill in the art to understand the embodimentsdisclosed herein.

II. Example Embodiment

FIG. 2 shows flowchart 250 depicting a method according to the presentinvention.

FIG. 3 shows multi-tenant configuration program 300, which performs atleast some of the method operations of flowchart 250. This method andassociated software will now be discussed, over the course of thefollowing paragraphs, with extensive reference to FIG. 2 (for the methodoperation blocks) and FIG. 3 (for the software blocks).

Processing begins at operation S255, where receive request module(“mod”) 302 receives a set of requests. In some embodiments of thepresent invention, receive request mod 302 receives a set of requestsfrom a set of requestors. Examples of a requestor include, but are notlimited to, a software library framework, a virtual container, and/or auser. In some embodiments, a set of requests are a set of input/output(“I/O”) requests. In further embodiments, a set of requests are a set ofread/write requests. In some of these embodiments, a set of requests area set of I/O read/write requests. An example of a virtual container isDocker. An example of a software library framework is Hadoop. In furtherembodiments, receive request mod 302 receives a set of requests from aset of dynamic instantiations of a requestor.

In some embodiments, a requestor is a first distributed file system. Insome of these embodiments, a first distributed file system is not POSIXcompatible. In further embodiments, a first distributed file system isorganized using a first interface standard. In some embodiments, a setof requests relates to a second distributed file system. In some ofthese embodiments, a second distributed file system is POSIX compatible.In further embodiments, a second distributed file system is organizedusing a second interface standard. Alternatively, in some embodiments:(i) a first distributed file system is POSIX compatible; and (ii) asecond distributed file system is not POSIX compatible. In furtheralternative embodiments, neither a first distributed file system, nor asecond distributed file system, are POSIX compatible, but the firstdistributed file system and the second distributed file system areorganized using different interface standards.

Processing proceeds to operation S260, where determine directory mod 304determines a set of directories corresponding to a set of requestors. Insome embodiments of the present invention, determine directory mod 304determines a set of directories corresponding to a set of requestors. Adirectory is a structure for organization of a set of computer files. Adirectory is sometimes also called a path, a folder, and/or a drawer. Adirectory can be expressed in various forms, including: (i)parent_folder/child_folder/file.extension; and/or (ii) ParentFolder>Child Folder>File. In some of these embodiments, determinedirectory mod 304 determines a set of directories corresponding to a setof tenant identifiers. In other embodiments, determine directory mod 304determines a set of directories corresponding to a set of tenantidentifiers by assigning a directory to a set of requestors. In furtherembodiments, determine directory mod 304 determines a set of directoriescorresponding to a set of tenant identifiers by assigning a subdirectoryto a set of requestors. In some embodiments, a first requestor in a setof requestors corresponds to a first directory. In other embodiments, aset of requestors share a first directory. In some embodiments,determine directory mod 304 determines a set of directoriescorresponding to a set of requestors from which receive request mod 302received a set of requests in operation S255.

Processing proceeds to operation S265, where determine tenant identifiermod 306 determines a set of tenant identifiers corresponding to a set ofrequests. In some embodiments of the present invention, determine tenantidentifier mod 306 determines a set of tenant identifiers correspondingto a set of requests. In some embodiments, determine tenant identifiermod 306 determines a set of tenant identifiers for a set of requestorsthat are dynamic instantiations. In alternative embodiments, determinetenant identifier mod 306 determines a set of tenant identifiers for aset of virtual containers. In further embodiments, determine tenantidentifier mod 306 determines a set of tenant identifiers for a set ofsoftware library frameworks. Alternatively, determine tenant identifiermod 306 determines a set of tenant identifiers for a set of users. Insome embodiments, determine tenant identifier mod 306 determines a setof tenant identifiers for a set of instances of a set of tenants. Insome embodiments, determine tenant identifier mod 306 determines a setof tenant identifiers corresponding to a set of requests received byreceive request mod 302 in operation S255. Alternatively, determinetenant identifier mod 306 determines a set of tenant identifierscorresponding to a set of directories determined by determine directorymod 304 in operation S260.

Processing proceeds to operation S270, where assign connector servicemod 308 assigns a connector service. In some embodiments of the presentinvention, assign connector service mod 308 assigns a connector service.In further embodiments, a connector service is an only connector serviceon a computer system. Alternatively, a connector service is an onlyconnector service associated with a first distributed file system and asecond distributed file system. In some of these embodiments, aconnector service directs requests from a set of requestors on a firstdistributed file system directed to a second distributed file system. Inother embodiments, assign connector service mod 308 assigns a connectorservice based, at least in part, on a set of tenant identifiers. Infurther embodiments, assign connector service mod 308 assigns aconnector service based, at least in part, on a set of directories. Aconnector service is sometimes also called a connection server. Aconnector service directs a set of requests through a set of appropriatechannels. A connection server may also perform functions including, butnot limited to: (i) authenticate a set of users; (ii) entitle a set ofusers to a set of resources; (iii) assign a set of packages to a set ofresources; (iv) manage local and/or remote sessions; (v) establish a setof secure connections; and/or (vi) apply policies. In some embodiments,assign connector service mod 308 assigns a connector service based, atleast in part, on a set of requestors of a set of requests received byreceive request mod 302 in operation S255. In other embodiments, assignconnector service mod 308 assigns a connector service based, at least inpart, on a set of requests received by receive request mod 302 inoperation S255. In further embodiments, assign connector service mod 308assigns a connector service based, at least in part, on a set ofdirectories determined by determine directory mod 304 in operation S260.In alternative embodiments, assign connector service mod 308 assigns aconnector service based, at least in part, on a set of tenantidentifiers determined by determine tenant identifier mod 306 inoperation S265.

Processing proceeds to operation S275, where determine node mod 310determines a node corresponding to a set of requestors. In someembodiments of the present invention, determine node mod 310 determinesa node corresponding to a set of requestors. In some of theseembodiments, determine node mod 310 determines that a first nodecorresponds to each requestor in a set of requestors. In some of theseembodiments, determine node mod 310 determines that a physical nodecorresponds to a set of requestors. In other embodiments, determine nodemod 310 determines that a virtual node corresponds to a set ofrequestors. In alternative embodiments, determine node mod 310determines a node corresponding to a set of requestors by assigning eachrequestor in the set of requestors to a first node. In some embodiments,determine node mod 310 determines a node corresponding to a set ofrequests. In further embodiments, determine node mod 310 determines anode corresponding to a set of tenant identifiers. In other embodiments,determine node mod 310 determines a node based, at least in part, on aconnector service. In alternative embodiments, determine nod mod 310determines a node based, at least in part, on a one-to-one relationshipbetween the node and a connector service. In other embodiments,determine node 310 maps a path between a connector service and a node.In some embodiments, determine node mod 310 determines a nodecorresponding to a set of requestors from which receive request mod 302received a set of requests in operation S255. In other embodiments,determine node mod 310 determines a node corresponding to a set ofrequests received by receive request mod 302 in operation S255. Infurther embodiments, determine node mod 310 determines a nodecorresponding to a set of directories determined by determine directorymod 304 in operation S260. In alternative embodiments, determine nodemod 310 determines a node corresponding to a set of tenant identifiersdetermined by determine tenant identifier mod 306 in operation S265.Alternatively, determine node mod 310 determines a node based, at leaston part, on a connector service assigned by assign connector service mod308 in operation S270.

Processing proceeds to operation S280, where process request mod 312processes a set of requests. In some embodiments of the presentinvention, process request mod 312 processes a set of requests. In someembodiments, process request mod 312 processes a set of requests based,at least in part, on a set of tenant identifiers. In other embodiments,process request mod 312 processes a set of requests based, at least inpart, on a node. In further embodiments, process request mod 312processes a set of requests based, at least in part, on a directory. Insome embodiments, process request mod 312 mounts a first distributedfile system to a second distributed file system. In alternativeembodiments, process request mod 312 processes a set of requests based,at least in part, on a connector service. For a read request, processrequest mod 312 reads a set of data from a storage. For a write request,process request mod 312 modifies a set of data in a storage. For aninput request, process request mod 312 receives a set of data. For anoutput request, process request mod 312 transmits a set of data. In someembodiments, process request mod 312 processes a set of requestsreceived by receive request mod 312 in operation S255. In otherembodiments, process request mod 312 processes a set of requests based,at least in part, on a set of tenant identifiers determined by determinetenant identifier mod 306 in operation S265. In further embodiments,process request mod 312 processes a set of requests based, at least inpart, on a node determined by determine node mod 310 in operation S275.In other embodiments, process request mod 312 processes a set ofrequests based, at least in part, on a set of directories determined bydetermine directory mod 304 in operation S260. In alternativeembodiments, process request mod 312 processes a set of requests based,at least in part, on a connector service determined by determineconnector service mod 308 in operation S270.

Processing terminates at operation S285, where generate result mod 314generates a set of results. In some embodiments of the presentinvention, generate result mod 314 generates a set of results for a setof requests. In some embodiments, generate result mod 314 generates aset of results to a set of read requests by generating a set of messagesincluding a set of data. In some embodiments, generate result mod 314generates a set of results to a set of write requests by generating aset of new data entries. In some embodiments, generate result mod 314generates a set of results to a set of input requests by storing a setof data that was received. In some embodiments, generate result mod 314generates a set of results to a set of output requests by generating aset of messages. In other embodiments, generate result mod 314 generatesresults for a first distributed file system that is not POSIXcompatible. In further embodiments, generate result mod 314 generates aset of results for a first distributed file system that is Hadoop. Inother embodiments, a result includes, but is not limited to, a new dataentry and/or a message with a set of data. In some embodiments, generateresult mod 314 generates a set of results to a set of requests receivedby receive request mod 302 in operation S255.

III. Further Comments and/or Embodiments

Some embodiments of the present invention recognize the following facts,potential problems, and/or potential areas for improvement with respectto the current state of the art: (i) managing a set of nodes, a set ofconnector services, and/or a set of directories corresponding to a setof tenant identifiers leads to an exponential increase in resources;(ii) various operating systems handle a set of nodes, a set of connectorservices, and/or a set of directories in a multitude of fashions; and/or(iii) some distributed file systems (“DFSs”) are not portable operatingsystem interface (“POSIX”) compatible; (iv) some DFSs cannot be mounted;and/or (v) hyper-convergence infrastructures attempt to decreaseresource usage. Under conventional means of managing a set of nodes, aset of connector services, and/or a set of directories corresponding toa set of tenant identifiers requires individual nodes and individualdirectories corresponding to each tenant identifier.

FIG. 4 shows flowchart 400 depicting a method according to the presentinvention. Processing begins at operation S405, where a multi-tenantconfiguration sub-system receives an I/O request from a Hadoop containerinstance. Processing proceeds to operation S410, where a multi-tenantconfiguration sub-system isolates a set of tenant identifiers for aHadoop container instance. Processing proceeds to operation S415, wherea multi-tenant configuration sub-system recognizes a Hadoop containerinstance based, at least in part, on a set of tenant identifiers.Processing proceeds to operation S420, where a multi-tenantconfiguration sub-system checks a set of permissions for a Hadoopcontainer instance. Processing terminates at operation S425, where amulti-tenant configuration sub-system handles an I/O request.

FIG. 5 shows a functional block diagram of system 500, including: Hadoopinstance 502; Hadoop instance 504; Hadoop instance 506; connectorservice 508; distributed file system 510; and physical node 512.Communication between each of Hadoop instance 502, Hadoop instance 504,and Hadoop instance 506 and distributed file system 510 traversesthrough connector service 508. By existing on physical node 512,distributed file system 510 can process all communications throughconnector service 508.

Some embodiments of the present invention may include one, or more, ofthe following features, characteristics, and/or advantages: (i)isolating a set of DFS instance data; (ii) isolating a set of Hadoopinstance data; (iii) introducing a multi-tenant recognition module in aDFS connector service; and/or (iv) providing a multi-tenant capabilityfor a hyper-converged DFS. A hyper-converged DFS is sometimes alsoreferred to as a multi-tenant DFS. In some embodiments of the presentinvention, a multi-tenant recognition module incorporates operation S410and operation S415 of FIG. 4. In other embodiments, connector service508 in FIG. 5 performs operation S410 and/or operation S415 of FIG. 4.In further embodiments, multi-tenant configuration sub-system provides aconnector service and a physical node in a one-to-one relationship. Inalternative embodiments, multi-tenant configuration sub-systemconfigures a set of DFS instances with a set of private networkaddresses. Alternatively, a multi-tenant configuration sub-systemconfigures a set of DFS instances with a private network address. Insome embodiments, a multi-tenant configuration sub-system isolates a DFSinstance in a directory. In further embodiments, a multi-tenantconfiguration sub-system isolates a DFS instance in a directory based,at least in part, on a tenant. In other embodiments, a multi-tenantconfiguration sub-system isolates a set of operations for a DFS instancein a directory.

FIG. 6 shows two tables. The first table in FIG. 6 is an instancecontainer mapping list. Two instances with three containers are shown,resulting in six tenant IDs. These six tenant IDs are all mapped to onenode. The second table in FIG. 6 is a reverse instance container mappinglist. The same six tenant IDs are shown. However, the second table issorted to determine a corresponding instance.

FIG. 7 shows flowchart 700 depicting a method according to the presentinvention. Processing begins at operation S705, where a multi-tenantconfiguration sub-system receives an I/O read/write request from aHadoop job in a container. Processing proceeds to operation S710, wherea multi-tenant configuration sub-system retrieves a container IP addressfrom an I/O request. Processing proceeds to operation S715, where amulti-tenant configuration sub-system retrieves a physical node IPaddress. Processing proceeds to operation S720, where a multi-tenantconfiguration sub-system queries an instance container mapping listbased on a container IP and a node IP. Processing proceeds to operationS725, where a multi-tenant configuration sub-system retrieves aninstance ID. Processing proceeds to operation S730, where a multi-tenantconfiguration sub-system retrieves an instance directory. Processingproceeds to operation S735, where a multi-tenant configurationsub-system transforms a set of I/O pathways. Processing terminates atoperation S740, where a multi-tenant configuration sub-system handles aset of I/O requests.

Some embodiments of the present invention may include one, or more, ofthe following features, characteristics, and/or advantages: (i) a DFSallows access to a set of files from a variety of hosts; (ii) a DFSallows a set of users to share a set of files across a set of devices;and/or (iii) a DFS is a popular storage system. Examples of DFSsinclude: IBM General Parallel File System (“GPFS”) File PlacementOptimizer (“FPO”), Red Hat Linux, GlusterFS, Lustre, Ceph, and ApacheHadoop Distributed File System (“HDFS”).

Some embodiments of the present invention may include one, or more, ofthe following features, characteristics, and/or advantages: (i) mountinga DFS; (ii) reading data from a DFS; (iii) writing data to a DFS; (iv)reading data from a DFS using a POSIX application; (v) writing data to aDFS using a POSIX application; (vi) reading data from a DFS using aPOSIX application in the DFS ecosystem; and/or (vii) writing data to aDFS using a POSIX application in the DFS ecosystem. Some embodiments ofthe present invention may include one, or more, of the followingfeatures, characteristics, and/or advantages: (i) determining a set ofpermissions based, at least in part, on a user ID; (ii) determining aset of permissions based, at least in part, on a group ID; (iii)determining a set of permissions for an operating environment; and/or(iv) determining a set of permissions for an operating system.

Some embodiments of the present invention may include one, or more, ofthe following features, characteristics, and/or advantages: (i) runninga DFS using a POSIX application; (ii) transferring a set of files over asingle connector service; (iii) transferring a set of files over asingle connector service on a DFS using a POSIX application; and/or (iv)running a hyper-converged DFS using a POSIX application. Someembodiments of the present invention may include one, or more, of thefollowing features, characteristics, and/or advantages: (i) running aDFS using a non-POSIX application; (ii) transferring a set of files overa single connector service; (iii) transferring a set of files over asingle connector service on a DFS using a non-POSIX application; and/or(iv) running a hyper-converged DFS using a non-POSIX application. Someembodiments of the present invention may include one, or more, of thefollowing features, characteristics, and/or advantages: (i) creating aset of clusters of a set of DFS instances; (ii) creating a set ofclusters of a set of DFS instances for a set of users; (iii) assigning aset of network addresses to a set of clusters; (iv) assigning a set oftenant identifiers to a set of clusters; (v) assigning a set of networkaddresses to a set of clusters, wherein the set of network addresses arenot related to a DFS; and/or (vi) assigning a set of tenant identifiersto a set of clusters, wherein the set of network addresses are notrelated to a DFS.

Some embodiments of the present invention may include one, or more, ofthe following features, characteristics, and/or advantages: (i) reducinga number of connector services; (ii) using a single connector service;(iii) reducing a number of connector services required to maintain amulti-tenant configuration; (iv) reducing a number of connector servicesrequired to maintain a multi-tenant configuration at an exponentiallevel; (v) reducing a number of tenant identifiers corresponding to anumber of clients on a DFS; and/or (vi) reducing a number of IPaddresses corresponding to a number of clients on a DFS.

In some embodiments of the present invention, a multi-tenantconfiguration sub-system generates a DFS cluster for a tenant. Infurther embodiments, a multi-tenant configuration sub-system generates atenant ID corresponding to a DFS cluster. A DFS cluster is sometimesalso referred to as a first distributed file system with multiplerequestors and/or multiple tenants. In some of these embodiments, amulti-tenant configuration sub-system assigns a tenant ID to a node.

Some embodiments of the present invention may include one, or more, ofthe following features, characteristics, and/or advantages: (i)configure a set of directories in a DFS; (ii) configure a set ofdirectories in a DFS and restart a connector service; (iii) creating aset of software library framework instances for a DFS instance; (iv)storing a set of tenant information in a directory in a hyper-convergedDFS; (v) recognizing a DFS a directory without restarting; (vi)restarting a DFS without creating a new DFS instance; (vii) providing aDFS cluster for a tenant; (viii) maintaining a DFS cluster for a tenant;and/or (ix) isolating a DFS based, at least in part, on a set ofhardware resources. Some embodiments of the present invention mayinclude one, or more, of the following features, characteristics, and/oradvantages: (i) generating a user ID when building a software libraryframework; (ii) generating a user ID when compiling a software libraryframework; (iii) generating a group ID when building a software libraryframework; and/or (iv) generating a group ID when compiling a softwarelibrary framework.

Some embodiments of the present invention may include one, or more, ofthe following features, characteristics, and/or advantages: (i) managinga hyper-converged big-data DFS; (ii) managing a multi-tenant big-dataDFS; (iii) managing a hyper-converged DFS in a cloud system; and/or (iv)managing a hyper-converged DFS in a virtual system.

IV. Definitions

“Present invention” does not create an absolute indication and/orimplication that the described subject matter is covered by the initialset of claims, as filed, by any as-amended set of claims drafted duringprosecution, and/or by the final set of claims allowed through patentprosecution and included in the issued patent. The term “presentinvention” is used to assist in indicating a portion or multipleportions of the disclosure that might possibly include an advancement ormultiple advancements over the state of the art. This understanding ofthe term “present invention” and the indications and/or implicationsthereof are tentative and provisional and are subject to change duringthe course of patent prosecution as relevant information is developedand as the claims may be amended.

“Embodiment,” see the definition for “present invention.”

“And/or” is the inclusive disjunction, also known as the logicaldisjunction and commonly known as the “inclusive or.” For example, thephrase “A, B, and/or C,” means that at least one of A or B or C is true;and “A, B, and/or C” is only false if each of A and B and C is false.

A “set of” items means there exists one or more items; there must existat least one item, but there can also be two, three, or more items. A“subset of” items means there exists one or more items within a groupingof items that contain a common characteristic.

A “plurality of” items means there exists at more than one item; theremust exist at least two items, but there can also be three, four, ormore items.

“Includes” and any variants (e.g., including, include, etc.) means,unless explicitly noted otherwise, “includes, but is not necessarilylimited to.”

A “user” or a “subscriber” includes, but is not necessarily limited to:(i) a single individual human; (ii) an artificial intelligence entitywith sufficient intelligence to act in the place of a single individualhuman or more than one human; (iii) a business entity for which actionsare being taken by a single individual human or more than one human;and/or (iv) a combination of any one or more related “users” or“subscribers” acting as a single “user” or “subscriber.”

The terms “receive,” “provide,” “send,” “input,” “output,” and “report”should not be taken to indicate or imply, unless otherwise explicitlyspecified: (i) any particular degree of directness with respect to therelationship between an object and a subject; and/or (ii) a presence orabsence of a set of intermediate components, intermediate actions,and/or things interposed between an object and a subject.

A “module” is any set of hardware, firmware, and/or software thatoperatively works to do a function, without regard to whether the moduleis: (i) in a single local proximity; (ii) distributed over a wide area;(iii) in a single proximity within a larger piece of software code; (iv)located within a single piece of software code; (v) located in a singlestorage device, memory, or medium; (vi) mechanically connected; (vii)electrically connected; and/or (viii) connected in data communication. A“sub-module” is a “module” within a “module.”

A “computer” is any device with significant data processing and/ormachine readable instruction reading capabilities including, but notnecessarily limited to: desktop computers; mainframe computers; laptopcomputers; field-programmable gate array (FPGA) based devices; smartphones; personal digital assistants (PDAs); body-mounted or insertedcomputers; embedded device style computers; and/or application-specificintegrated circuit (ASIC) based devices.

“Electrically connected” means either indirectly electrically connectedsuch that intervening elements are present or directly electricallyconnected. An “electrical connection” may include, but need not belimited to, elements such as capacitors, inductors, transformers, vacuumtubes, and the like.

“Mechanically connected” means either indirect mechanical connectionsmade through intermediate components or direct mechanical connections.“Mechanically connected” includes rigid mechanical connections as wellas mechanical connection that allows for relative motion between themechanically connected components. “Mechanically connected” includes,but is not limited to: welded connections; solder connections;connections by fasteners (e.g., nails, bolts, screws, nuts,hook-and-loop fasteners, knots, rivets, quick-release connections,latches, and/or magnetic connections); force fit connections; frictionfit connections; connections secured by engagement caused bygravitational forces; pivoting or rotatable connections; and/or slidablemechanical connections.

A “data communication” includes, but is not necessarily limited to, anysort of data communication scheme now known or to be developed in thefuture. “Data communications” include, but are not necessarily limitedto: wireless communication; wired communication; and/or communicationroutes that have wireless and wired portions. A “data communication” isnot necessarily limited to: (i) direct data communication; (ii) indirectdata communication; and/or (iii) data communication where the format,packetization status, medium, encryption status, and/or protocol remainsconstant over the entire course of the data communication.

The phrase “without substantial human intervention” means a process thatoccurs automatically (often by operation of machine logic, such assoftware) with little or no human input. Some examples that involve “nosubstantial human intervention” include: (i) a computer is performingcomplex processing and a human switches the computer to an alternativepower supply due to an outage of grid power so that processing continuesuninterrupted; (ii) a computer is about to perform resource intensiveprocessing and a human confirms that the resource-intensive processingshould indeed be undertaken (in this case, the process of confirmation,considered in isolation, is with substantial human intervention, but theresource intensive processing does not include any substantial humanintervention, notwithstanding the simple yes-no style confirmationrequired to be made by a human); and (iii) using machine logic, acomputer has made a weighty decision (for example, a decision to groundall airplanes in anticipation of bad weather), but, before implementingthe weighty decision the computer must obtain simple yes-no styleconfirmation from a human source.

“Automatically” means “without any human intervention.”

The term “real time” (and the adjective “real-time”) includes any timeframe of sufficiently short duration as to provide reasonable responsetime for information processing as described. Additionally, the term“real time” (and the adjective “real-time”) includes what is commonlytermed “near real time,” generally any time frame of sufficiently shortduration as to provide reasonable response time for on-demandinformation processing as described (e.g., within a portion of a secondor within a few seconds). These terms, while difficult to preciselydefine, are well understood by those skilled in the art.

What is claimed is:
 1. A method comprising: determining a firstdirectory corresponding to a first tenant identifier in a set of tenantidentifiers, wherein: the first directory is organized using a firstinterface standard, and the first tenant identifier corresponds to afirst tenant of the first directory; assigning a connector service tothe first directory and the first tenant identifier; determining asecond directory corresponding to the connector service, wherein: thesecond directory is organized using a second interface standard, a firstnode contains a first set of files on the second directory, and thefirst set of files corresponds to the first tenant; processing a firstread/write request in a set of read/write requests using the connectorservice and the first node, wherein the first read/write request is fromthe first tenant; and generating a first result to the first read/writerequest; wherein: at least processing the first read/write request usingthe connector service and the first node is performed by computersoftware running on computer hardware.
 2. The method of claim 1, furthercomprising: determining a third directory corresponding to a secondtenant identifier in the set of tenant identifiers, wherein: the secondtenant identifier corresponds to a second read/write request in the setof read/write requests, and the third directory is organized using thefirst interface standard; assigning the connector service to the thirddirectory and the second tenant identifier; processing the secondread/write request using the connector service and a second node,wherein: a second node contains a second set of files on the seconddirectory, and the second set of files corresponds to the second tenant;and generating a second result to the second read/write request.
 3. Themethod of claim 2, wherein the second node is the first node.
 4. Themethod of claim 1, wherein the first result is selected from a groupconsisting of: a new data entry, and a message with a set of data. 5.The method of claim 1, wherein the first interface standard is not POSIXcompatible.
 6. The method of claim 1, wherein the second interfacestandard is POSIX compatible.
 7. The method of claim 1, wherein thefirst directory is organized using an Apache Hadoop Distributed FileSystem (“HDFS”).